Learning string distance with smoothing for OCR spelling correction

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction

We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformationmodels that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dict...

متن کامل

Context-Based Spelling Correction for Japanese OCR

We present a novel spelling correction method ['or those languages that have no delimiter between words, such ~rs ,lap;mese, (.',hinese, ,~nd ThM. It consists of an al)proximate word matching method and an N-best word seg mental|on Mgorithm using a statistical la.nguage model. For OCR errors, the proposed word-based correction method outperf.ornrs the conventional charactm'-b`ased correction me...

متن کامل

Statistical Learning for OCR Text Correction

The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are still prone to suggest correction candidates from limited observations while insufficiently accounting for the characteristics of OCR errors. In this paper, ...

متن کامل

Approximate string matching algorithms for limited-vocabulary OCR output correction

Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian a...

متن کامل

Conceptual Distance and Automatic Spelling Correction

A BSTRACT. Text from different sources usually arrives under imperfect conditions. When an anomalous word is detected automatic word recognisers produce a list of candidates from which only one is correct. A variety of techniques have been devised to discriminate among the possible correction candidates. The project we are involved in tries to exploit linguistic knowledge in Spelling Correction...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Multimedia Tools and Applications

سال: 2016

ISSN: 1380-7501,1573-7721

DOI: 10.1007/s11042-016-4185-5